Speech intonation for TTS: study on evaluation methodology

نویسندگان

Javier Latorre

Kayoko Yanagisawa

Vincent Wan

BalaKrishna Kolluru

Mark J. F. Gales

چکیده

The standard evaluation of intonation models is by means of non-referenced subjective tests (pair or MOS) in which subjects rate the quality or compare different samples without any explicit reference. These tests are usually conducted on an isolated sentence basis. However, for a single sentence, with no contextual information, there are multiple valid intonations. A subject’s preference over this range of intonation patterns may be highly personal. This paper investigates the degree to which this ambiguity in the appropriate intonation pattern impacts the assessments of prosody for speech synthesis systems. To examine this problem, the variance of the F0 pattern of several vocoded sentences was modified and subjects asked to compare multiple versions with different levels of modification in terms of preference/quality. Then, they were presented with the reference which defines the original intonation and asked about the similarity to that reference. The results show that subjects can identify the samples with no F0 variance modification when given a reference but they don’t always prefer them. Thus, nonreferenced tests with no context, though may help to analyse user acceptability, may not be appropriate to measure the performance of intonation models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an intonation module for a portuguese TTS system

In this paper, a correlation between the linguistic structure of the written text and the real intonation behavior of the read speech in European Portuguese language (EP) is presented. It is our belief that intonation behavior in EP can be strongly predicted from two main coordinates: the syntactic structure of the sentence and its pragmatic communicative function, in one way, combined with the...

متن کامل

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...

متن کامل

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...

متن کامل

Modeling of intonation bearing emphasis for TTS-synthesis of greek dialogues

TTS-synthesis of neutral style Greek with good intelligibility and quality has been achieved some time ago. As a further step towards expanding the applications domain of the TTS-system developed in our laboratory, the incorporation of emphasis into speech used in man-machine dialogues according to their context has been studied recently. In this paper the method applied for the analysis of int...

متن کامل

A joint prosody evaluation of French text-to-speech synthesis systems

This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tes...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Speech intonation for TTS: study on evaluation methodology

نویسندگان

چکیده

منابع مشابه

Towards an intonation module for a portuguese TTS system

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

Modeling of intonation bearing emphasis for TTS-synthesis of greek dialogues

A joint prosody evaluation of French text-to-speech synthesis systems

عنوان ژورنال:

اشتراک گذاری